摘要 :
Commercial OLAP systems usually treat OLAP dimensions as static entities. In practice, dimension updates are often necessary in order to adapt the multidimensional database to changing requirements. In earlier work we proposed a t...
展开
Commercial OLAP systems usually treat OLAP dimensions as static entities. In practice, dimension updates are often necessary in order to adapt the multidimensional database to changing requirements. In earlier work we proposed a temporal multidimensional model and TOLAP, a query language supporting it, accounting for dimension up-dates and schema evolution at a high level of abstraction. In this paper we present our implementation of the model and the query language. We show how to translate a TOLAP program to SQL, and present a real-life case study, a medical center in Buenos Aires. We apply our implementation to this case study in order to show how our approach can address problems that occur in real situations and that current non-temporal commercial systems cannot deal with. We present results on query and dimension update performance, and briefly describe a visualization tool that allows editing and running TOLAP queries, performing dimension updates, and browsing dimensions across time.
收起
摘要 :
Commercial OLAP systems usually treat OLAP dimensions as static entities. In practice, dimension updates are often needed to adapt the warehouse to changing requirements. In earlier work, we defined a taxonomy for these dimension ...
展开
Commercial OLAP systems usually treat OLAP dimensions as static entities. In practice, dimension updates are often needed to adapt the warehouse to changing requirements. In earlier work, we defined a taxonomy for these dimension updates and a minimal set of operators to perform them. In this paper we present TSOLAP, an OLAP server supporting fully dynamic dimensions. TSOLAP conforms to the OLE DB for OLAP norm, so it can be used by any client application based on this norm, and can use as backend any conformant relational server. We incorporate dimension update support to MDX, Microsoft's language for OLAP, and introduce TSShow, a visualization tool for dimensions and data cubes. Finally, we present the results of a real-life case study in the application of TSOLAP to a medium-sized medical center.
收起
摘要 :
Traditional OLAP tools have proven to be successful in analyzing large sets of enterprise data. For today's business dynamics, sometimes these highly curated data is not enough. External data (particularly web data), may be useful...
展开
Traditional OLAP tools have proven to be successful in analyzing large sets of enterprise data. For today's business dynamics, sometimes these highly curated data is not enough. External data (particularly web data), may be useful to enhance local analysis. In this paper we discuss the extraction of multidimensional data from web sources, and their representation in RDFS. We introduce Open Cubes, an RDFS vocabulary for the specification and publication of multidimensional cubes on the Semantic Web, and show how classical OLAP operations can be implemented over Open Cubes using SPARQL 1.1, without the need of mapping the multidimensional information to the local database (the usual approach to multidimensional analysis of Semantic Web data). We show that our approach is plausible for the data sizes that can usually be retrieved to enhance local data repositories.
收起
摘要 :
Traditional OLAP tools have proven to be successful in analyzing large sets of enterprise data. For today's business dynamics, sometimes these highly curated data is not enough. External data (particularly web data), may be useful...
展开
Traditional OLAP tools have proven to be successful in analyzing large sets of enterprise data. For today's business dynamics, sometimes these highly curated data is not enough. External data (particularly web data), may be useful to enhance local analysis. In this paper we discuss the extraction of multidimensional data from web sources, and their representation in RDFS. We introduce Open Cubes, an RDFS vocabulary for the specification and publication of multidimensional cubes on the Semantic Web, and show how classical OLAP operations can be implemented over Open Cubes using SPARQL 1.1, without the need of mapping the multidimensional information to the local database (the usual approach to multidimensional analysis of Semantic Web data). We show that our approach is plausible for the data sizes that can usually be retrieved to enhance local data repositories.
收起
摘要 :
The classic Generalized Sequential Patterns (GSP) algorithm returns all frequent sequences present in a database. However, usually a few ones are interesting from a user's point of view. Thus, post-processing tasks are required in...
展开
The classic Generalized Sequential Patterns (GSP) algorithm returns all frequent sequences present in a database. However, usually a few ones are interesting from a user's point of view. Thus, post-processing tasks are required in order to discard uninteresting sequences. To avoid this drawback, languages based on regular expressions (RE) were proposed to restrict frequent sequences to the ones that satisfy user-specified constraints. In all of these languages, REs are applied over items, which limits their applicability in complex real-world situations. We propose a much powerful language, based on regular expressions, denoted RE-SPaM, where the basic elements are constraints defined over the (temporal and non-temporal) attributes of the items to be mined. Expressions in this language may include attributes, functions over attributes, and variables. We specify the syntax and semantics of RE-SPaM, and present a comprehensive set of examples to illustrate its expressive power. We study in detailhow the expressions can be used to prune the resulting sequences in the mining process. In addition, we introduce techniques that allow pruning sequences in the early stages of the process, reducing the need to access the database, making use of the categorization of the attributes that compose the items, and of the automaton that accepts the language generated by the RE. Finally, we present experimental results. Although in this paper we focus on trajectory databases, our approach is general enough for being applied to other settings.
收起
摘要 :
The classic Generalized Sequential Patterns (GSP) algorithm returns all frequent sequences present in a database. However, usually a few ones are interesting from a user's point of view. Thus, post-processing tasks are required in...
展开
The classic Generalized Sequential Patterns (GSP) algorithm returns all frequent sequences present in a database. However, usually a few ones are interesting from a user's point of view. Thus, post-processing tasks are required in order to discard uninteresting sequences. To avoid this drawback, languages based on regular expressions (RE) were proposed to restrict frequent sequences to the ones that satisfy user-specified constraints. In all of these languages, REs are applied over items, which limits their applicability in complex real-world situations. We propose a much powerful language, based on regular expressions, denoted RE-SPaM, where the basic elements are constraints defined over the (temporal and non-temporal) attributes of the items to be mined. Expressions in this language may include attributes, functions over attributes, and variables. We specify the syntax and semantics of RE-SPaM, and present a comprehensive set of examples to illustrate its expressive power. We study in detailhow the expressions can be used to prune the resulting sequences in the mining process. In addition, we introduce techniques that allow pruning sequences in the early stages of the process, reducing the need to access the database, making use of the categorization of the attributes that compose the items, and of the automaton that accepts the language generated by the RE. Finally, we present experimental results. Although in this paper we focus on trajectory databases, our approach is general enough for being applied to other settings.
收起
摘要 :
Commercial OLAP systems usually consider OLAP dimensions as static entities. In practice, dimension updates are often necessary in order to adapt the multidimensional database to changing requirements. We have already defined a ta...
展开
Commercial OLAP systems usually consider OLAP dimensions as static entities. In practice, dimension updates are often necessary in order to adapt the multidimensional database to changing requirements. We have already defined a taxonomy for these dimension updates in previous works, and a minimal set of operators to perform them. In this paper, we show the need to keep track of the history of the data warehouse. In order to address this problem, we propose a new (temporal) multidimensional model, along with a querly language supporting it. We formally define the model, introduce the language by means of examples, and define its syntax and semantics. Finally, we discuss implementation issues, and how a translation into SQL:99, TSQL2 or other SQL-based languages can proceed.
收起
摘要 :
Commercial OLAP systems usually consider OLAP dimensions as static entities. In practice, dimension updates are often necessary in order to adapt the multidimensional database to changing requirements. Query languages are then nee...
展开
Commercial OLAP systems usually consider OLAP dimensions as static entities. In practice, dimension updates are often necessary in order to adapt the multidimensional database to changing requirements. Query languages are then needed to retrieve historical information from a data warehouse. Although many proposals addressing temporal OLAP exist, no one supports full operation over the World Wide Web. After introducing a temporal data model supporting historical dimensions and fact table versioning, we present a three-tier architecture based on Web services, SOAP and XML, allowing efficient querying over the Web. In this architecture, XML metadata is stored at the application server in the form of XML documents containing the data warehouse structure. This allows addressing most of the requests (navigation through time, updates, and queries) without accessing the data warehouse. We present our implementation, and also discuss query processing issues, like answering historical queries using materialized views.
收起
摘要 :
The <i>ACM International Workshop on Data Warehousing and Online Analytical Processing (DOLAP)</i> is an annual event that provides an international forum where both researchers and practitioners can share their findings in theore...
展开
The <i>ACM International Workshop on Data Warehousing and Online Analytical Processing (DOLAP)</i> is an annual event that provides an international forum where both researchers and practitioners can share their findings in theoretical foundations, current methodologies, practical experiences, and new research directions in the areas of data warehousing and online analytical processing. The main theme of seventh DOLAP workshop is the methods, technologies, and tools that enable the Business Intelligence lifecycle from data modeling and acquisition to knowledge extraction to delivery of results across a wide variety of organizations including corporate, scientific, government, and healthcare domains.
These proceedings contain the papers selected for presentation at the workshop. We received 29 submissions from 13 different countries. After careful review, the program committee selected 14 papers for presentation at the workshop. The accepted papers were presented in 4 sessions: Physical Design, OLAP, Business Intelligence, and Query/View Processing. A keynote address was given by Kareem M. Saad of IBM Healthcare and Life Sciences on clinical/genomics data warehousing to support information-based medicine. We hope that these proceedings will serve as a valuable reference for data warehousing and OLAP researchers and practitioners.
收起
摘要 :
The ACM International Workshop on Data Warehousing and Online Analytical Processing (DOLAP) is an annual event that provides an international forum where both researchers and practitioners can share their findings in theoretical f...
展开
The ACM International Workshop on Data Warehousing and Online Analytical Processing (DOLAP) is an annual event that provides an international forum where both researchers and practitioners can share their findings in theoretical foundations, current methodologies, practical experiences, and new research directions in the areas of data warehousing and online analytical processing. The main theme of seventh DOLAP workshop is the methods, technologies, and tools that enable the Business Intelligence lifecycle from data modeling and acquisition to knowledge extraction to delivery of results across a wide variety of organizations including corporate, scientific, government, and healthcare domains.
These proceedings contain the papers selected for presentation at the workshop. We received 29 submissions from 13 different countries. After careful review, the program committee selected 14 papers for presentation at the workshop. The accepted papers were presented in 4 sessions: Physical Design, OLAP, Business Intelligence, and Query/View Processing. A keynote address was given by Kareem M. Saad of IBM Healthcare and Life Sciences on clinical/genomics data warehousing to support information-based medicine. We hope that these proceedings will serve as a valuable reference for data warehousing and OLAP researchers and practitioners.
收起